Surveillance of communicable diseases using social media: A systematic review

Background Communicable diseases pose a severe threat to public health and economic growth. The traditional methods that are used for public health surveillance, however, involve many drawbacks, such as being labor intensive to operate and resulting in a lag between data collection and reporting. To effectively address the limitations of these traditional methods and to mitigate the adverse effects of these diseases, a proactive and real-time public health surveillance system is needed. Previous studies have indicated the usefulness of performing text mining on social media. Objective To conduct a systematic review of the literature that used textual content published to social media for the purpose of the surveillance and prediction of communicable diseases. Methodology Broad search queries were formulated and performed in four databases. Both journal articles and conference materials were included. The quality of the studies, operationalized as reliability and validity, was assessed. This qualitative systematic review was guided by the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. Results Twenty-three publications were included in this systematic review. All studies reported positive results for using textual social media content to surveille communicable diseases. Most studies used Twitter as a source for these data. Influenza was studied most frequently, while other communicable diseases received far less attention. Journal articles had a higher quality (reliability and validity) than conference papers. However, studies often failed to provide important information about procedures and implementation. Conclusion Text mining of health-related content published on social media can serve as a novel and powerful tool for the automated, real-time, and remote monitoring of public health and for the surveillance and prediction of communicable diseases in particular. This tool can address limitations related to traditional surveillance methods, and it has the potential to supplement traditional methods for public health surveillance.

related experiences on social media [26,27]. Various studies have indicated that the analysis of health-related content published to social media has the potential to significantly improve the public health surveillance system [28][29][30]. In addition, in the preceding years, various studies have been published that utilized health-related textual content from social media for the purpose of public health surveillance of communicable diseases. Furthermore, three reviews [31][32][33] have been performed on the topic of internet-based public health surveillance, and while these reviews provide new insights about the vast opportunities of using social media content for public health surveillance, these reviews are not systematic reviews.
We acknowledge that only five systematic reviews have been conducted thus far that are somewhat related to this topic. First, Velasco et al. [34] found that although incorporating digital content as a source for public health surveillance has great potential, there is a reluctance among public health authorities to include this content in the systems for public health surveillance. Second, Charles-Smith et al. [35] found that analyzing content published to social media has the potential to increase public health, but they based their findings on only 10 publications. Third, Fung et al. [36] performed a systematic review of 12 studies that utilized social media content published during the 2014-2015 Ebola epidemic in West Africa, and they reported that no study evaluated their utility for any public health organization. The aforementioned systematic literature reviews were, among others, not tailored to communicable diseases [34,35] and emphasized only one regional epidemic [36]. Fourth, Abad et al. [37] conducted a scoping review to summarize the literature on applications of natural language processing for digital public health surveillance, and they emphasized databases in the field of medicine and public health in their literature search strategy. Not including relevant databases in the field of computer science, data science, and information science is a limitation of their study. Fifth, Gupta and Katarya [38] performed a systematic review on the utilization of social media data in real-time public health surveillance systems, and they concluded that, compared to traditional methods, the analysis of social media data has increased the ability of these systems to predict diseases. However, two differences of their systematic review are that the literature search involved all types of artificial intelligence instead of focusing on the branch of natural language processing. As a consequence, they had a limited emphasis on the technical aspects of natural language processing in detail, such as explaining how preprocessing of natural language was performed and which methods and tools were used. We believe that a new systematic review is required that emphasizes the technical aspects of these applications of natural language processing for the surveillance of communicable diseases. In addition, considering the changes in social media use and advances in the respective fields of science in the foregoing half decade since the reviews above were published, some of these reviews may already be outdated. Therefore, in this paper, we perform a new and thorough systematic literature review that investigates how textual content published to social media can be used for the purpose of the surveillance and prediction of communicable diseases. A systematic review of the evidence on this topic can greatly benefit public health authorities. In addition to evidence about the effectiveness of specific methods, this systematic review also provides a synthesis of the communicable diseases that were studied, social media platforms that were used, and which software and algorithms were utilized in these studies. If textual content from social media can indeed be used to surveille and predict outbreaks of communicable diseases, then such systems may become a powerful tool and asset for public health authorities and have the potential to address most of the limitations of the methods that are commonly used in traditional public health surveillance systems [28][29][30].
There is an opportunity to develop a proactive global public health surveillance system [23]. This tool should enable the automated and real-time monitoring of diseases worldwide by including information from various novel sources containing contextual information about social media users while minimizing the overall processing time from data collection to the reporting of identified findings [24]. This tool could significantly benefit rapid and evidencebased decision-making regarding infectious disease outbreaks [24]. A systematic review could provide more insight into this opportunity.

Background
Public health surveillance, also called epidemiologic surveillance, involves the ongoing and systematic collection, management, and monitoring of data about diseases, with the purpose of identifying trends, e.g., [39][40][41]. The overall objective of public health surveillance is to detect outbreaks of diseases at the earliest possible time so that the required preparatory activities can be planned and performed and sufficient health resources can be allocated to enable high-quality and timely public health interventions intended to mitigate the disease [42,43]. In addition, once the disease finally appears, the authorities, medical professionals, and the entire society can immediately initiate the planned remediating activities, facilitating an effective and prompt intervention. Therefore, public health surveillance is a crucial system for the identification, prevention, and control of disease outbreaks [44] while enabling a better allocation of health resources [18].

Traditional system for surveillance
In the traditional system for public health surveillance, the responsible public health authorities continuously collect data on diseases, which are primarily derived from diagnosed cases that are reported by emergency departments, hospitals, laboratories, and other medical professionals [1,45]. The identified illnesses are predominantly observed from clinical data such as diagnoses and clinical reports [45,46]. It has, however, been argued that these passive surveillance strategies fail to provide complete and timely overviews of the diseases [47].
In addition, historical data are analyzed to identify and visualize disease-related trends, such as seasonal influenza, which has often been occurring around the same months throughout the preceding decades, and may, therefore, be predicted with a reasonable accuracy [6,48]. In contrast, other researchers report that the influenza virus continuously evolves into slightly different variations each year, which makes forecasting the timing of influenza outbreaks as well as their impacts on the society very difficult [49]. However, the emergence of many other infectious diseases cannot be forecasted based on historical data [47,50].

Limitations of the traditional system for surveillance
The methods that are commonly used in the traditional system for public health surveillance have been practiced for many decades. Although these systems are known to improve public health and reduce mortality, there is no consensus on the degree of usefulness of individual methods or on the best way to support their function [51]. Likewise, other literature has reported that the authorities have been unable to successfully reduce the incidence and prevalence of dengue and other mosquito-related epidemics [52].
Overall, these systems involve two significant limitations. First, a major drawback is that these methods are inefficient and time-consuming [1,20]. To identify confirmed cases, the system requires lab work that is very labor intensive to operate and maintain, which significantly increases the time required to process the clinical data [6]. For example, in the United States, the time required to collect and analyze the data about seasonal influenza and to produce and distribute the reports was estimated to be two weeks [49]. As a consequence, once these reports are finally distributed to politicians, medical professionals, and the general public, the reported findings are very likely to be outdated and may thus no longer accurately represent the current situation [53]. Therefore, these methods are not suited for the surveillance of novel infectious diseases such as SARS-CoV-2, which emerged in late 2019 [8] and involves an urgent need for real-time updates and demands immediate interventions [19]. While contact tracing is able to successfully trace infections, non-symptomatic and mild cases are nearly impossible to track and can easily enter other countries unnoticed.
Second, for communicable diseases such as malaria, disease trends can only be detected and analyzed after the actual outbreak of this disease [47,50]. A severe limitation is that the outbreak and distribution of such diseases cannot be forecasted reliably [54].
Consequently, in the context of our highly dynamic society, the interconnectivity of all major cities by air travel leads to the very likely scenario that the outbreak of an infectious disease will easily spread around the globe in a matter of a few days, especially in non-symptomatic or mild cases [47,55,56].

Value of understanding text published to social media
Humans spend a significant amount of their time on social media communicating and disseminating information [4]. Social media platforms provide access to an abundance of valuable and public user-generated data that may be useful for public health surveillance and to detect, monitor, and prevent diseases [19,45,57,58]. This makes social media platforms an important source for generating new knowledge [19].
A distinctive feature of social media is that it transforms its users into human sensors, although potentially biased and unreliable, who personally report on a variety of events and who may provide additional contextual information [6]. Furthermore, social media platforms often also collect geographical information about the precise locations of their users, which adds an additional and potentially valuable geographical dimension to these data [19,59].
The analysis of textual content from social media is, however, not restricted to the field of diseases; an abundance of studies have used data from social media for application in many domains.
For example, user-generated content has been analyzed in a wide variety of domains, such as agriculture [60], business [61], and consumer behavior [62]; it has been used for purposes ranging from the detection of earthquakes [63], emergency and disaster management [64] to understanding migraines [65], presidential elections [66], political campaigns [67], and product design [68], to predicting the revenue of movies [69], forecasting sports events [70], identifying the topical interests of users [71], identifying trending topics [72], and investigating voting patterns in elections [73].

Natural language processing
The unstructured nature of social media content, compared to structured data, demands much more preprocessing and processing before it can be analyzed [50]. Most data that are generated today have an unstructured format, e.g., text [74]. Only a small fraction of data has a structured format, which can then be analyzed directly with well-established techniques from data mining [74].
In the preceding years, extensive techniques for processing human language have been developed and refined, and the relevant domain that emerged has been named natural language processing (hereafter NLP) [75,76]. NLP resembles the science of using computers to understand human language, while text mining provides the required methods and algorithms.
The purpose of text mining is to "discover novel information in a timely manner from large-scale text collections by developing high performance algorithms for sourcing and converting unstructured textual data to a machine understandable format and then filtering this according to the needs of its users" [75]. Therefore, text mining is used for the automatic discovery of patterns, relationships, and high-quality insights from textual data [77,78].
These techniques are used abundantly among both researchers and professionals [50]. Sentiment analysis is a popular technique that is frequently used in the domain of text mining. Sentiment analysis involves the identification of attitudes, emotions, and opinions that people have in relation to an entity, which is observed from expressed human language [89,90]. The opportunity derived from using sentiment analysis on content from social media is that it may enable innovative applications [20]. For example, sentiment analysis enables the identification of content as either a fact or an opinion (also called subjectivity). In addition, for opinions, sentiment analysis can also identify polarity, namely, whether an opinion is positive, neutral, or negative.

Methodology
This qualitative systematic review was guided by the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines [111,112] (see S2 Appendix). However, most of the reviewed papers did not contain controlled trials, comparable statistical analysis, or comparable methodologies, making it impossible to apply the entire PRISMA checklist to this review. Therefore, we only applied items on the checklist if they were applicable, and thus, our review does not conform completely to the guidelines.
The following search strategy and procedures for study selection and analysis were used. The study selection, quality assessment of the included studies, and thematic analysis were performed by one author (PP). However, the procedures and findings were discussed by all authors, and potential disagreements were resolved by consensus.

Information sources
This systematic review is based on literature that was indexed by four large databases, namely, the ACM Digital Library, IEEE Xplore, PubMed, and Web of Science. These databases were selected because of their relevance to this topic.
The ACM Digital Library and IEEE Xplore databases were searched for publications in the fields of computer science, data science, information management, and information technology. IEEE Xplore was also selected because much research on this topic is exclusively published at conferences instead of in peer-reviewed journals. The Institute of Electrical and Electronics Engineering (IEEE) hosts many of these relevant conferences. Furthermore, PubMed was included because of its focus on literature in the domain of medicine and healthcare, while Web of Science is a very broad database that indexes the literature from many relevant disciplines, such as public policy and the social sciences.

Search strategy
An optimized and broad search strategy was formulated for each of the four databases (see S1 Appendix). Overall, the search strategy consisted of two blocks with search terms related to natural language processing and public health monitoring. In addition, database-specific filters were applied to narrow the search results further.
The first block, natural language processing, contained the search terms artificial intelligence, machine learning, text mining, computational linguistics, natural language processing, sentiment analysis, word embeddings, and Natural Language Toolkit. Abbreviations and wildcards were included to find alternative phrasing of these concepts. The OR operator was used to combine these search terms.
The second block, public health monitoring, contained the search terms public health surveillance, public health monitoring, and health monitoring. Experimental searches have indicated that these broader search terms resulted in the most relevant results. The OR operator was used to combine these search terms.
If supported by the database, subject headings such as MeSH terms for PubMed were also included in the search strategy. Subsequently, the AND operator was used to combine the queries from each block into the final search query.
The literature search was performed in March 2020. After executing the formulated search queries in each database, additional filters were manually applied to narrow the search results further. Although the precise filters were different across the databases, two examples of such filters are that publications were only written in the English language and that these studies were published in journals or presented at conferences.
For each of the four databases, all search results were then exported and subsequently imported into the same EndNote Library. Because these databases partially returned the same results, the deduplication strategy by Bramer et al. [113] was used to eliminate these duplicate publications from the EndNote Library. Consequently, the EndNote Library contained only unique results.

Process of study selection
The remaining publications were screened and selected using three subsequent phases based on their title, abstract, and full text. To avoid erroneously excluding publications, the screening in these phases was performed with high flexibility. Therefore, if there was any doubt concerning a publication's eligibility or when insufficient information was provided to confidently exclude a manuscript, that publication was retained for further screening in a subsequent phase.
In the first phase, the titles of these publications were screened for their relevance to the topic of this systematic review. The titles of eligible studies indicated the analysis of textual content for the surveillance or monitoring of diseases.
In the second phase, the abstract and keywords of the remaining studies were screened for information indicating the analysis of textual content that was generated by users and published to at least social media, with the purpose of public health surveillance and monitoring of communicable diseases. As a result, studies that only analyzed news articles were considered irrelevant and were eliminated.
Finally, the third phase involved rigorous screening of the full text of the remaining publications. Eligible studies reported original and empirical research analyzing the textual content that the general public published to at least social media, with the purpose of surveilling and monitoring public health with respect to communicable diseases. This phase did not discriminate between geographies, social media platforms, or communicable diseases. However, publications that only investigated non-communicable diseases were eliminated. When studies investigated communicable diseases, this systematic review did not discriminate between the type of disease, i.e., all communicable diseases were included in this systematic review.
This resulted in a remaining subset of the identified publications that was included for further selection in this systematic review.

Selection criteria
Overall, eligible publications reported original and empirical research that reported findings on the application of analyzing user-generated textual content from social media for the monitoring and prediction of communicable diseases. Reviews, discussion papers, editorials, and papers that only proposed a framework for the analysis of social media content without the actual application and reporting of these findings were eliminated. All peer-reviewed journal articles and publications related to conferences were included.
In addition, although studies were considered relevant if they included textual content that was published to at least social media, this systematic review did not discriminate between the different social media platforms. All social media platforms were considered relevant and were included. Likewise, this systematic review included all papers irrespective of the language of the social media content used, the geography of these users and their content, or the authors of the identified publications.
This study aimed to aggregate the reported findings on the surveillance and monitoring of public health based on the experiences of the population that were published on social media. Therefore, papers were excluded if they only included content that was published on social media by authors other than the general public, such as governments, health professionals, and commercial entities.

Data analysis
In accordance with Kampmeijer et al. [114] and utilizing the process described by Pilipiec et al. [81], the included studies were first assessed according to their quality, which was operationalized as reliability and validity. A reliable study provided a thorough and complete description of the methods that were used for the data collection and data analysis, and this process was also considered repeatable [114]. A valid study reported results that were consistent with the research objective and the utilized research methods [114]. An ordinal scale was used to grade studies with respect to their reliability and validity as either low, medium, or high. Regardless of the quality level, all studies were included in this systematic review.
Directed qualitative content analysis, also called thematic analysis, was used to analyze the included studies [115]. Thematic analysis is a primary method for qualitative research that is widely used among qualitative researchers [116,117]. Its popularity may be explained because thematic analysis is a highly flexible method that can produce trustworthy insights [116,118].
The themes of interest were based on the objective of this systematic review. The following themes were extracted from these publications: authors, year of publication, publication type, name of communicable disease, social media platform used, sample size, language of the data, period of data collection, horizon of data collection, country, software used for natural language processing, methods and techniques used for natural language processing, investigated target, algorithm used for predicting the target, reported result, description of the results, reliability, and validity.
The extracted information from all included publications was used to create an extraction matrix. The results were summarized using tables, and a synthesis of this information was presented narratively. In addition to assessing the quality of the studies that were included in this systematic review, the PRISMA checklist [111,112] in S2 Appendix was used to assess the quality of this systematic review.

Results
The flow diagram in Fig 1 presents the results of the studies that were selected to be included in this systematic review. The execution of the optimized search queries in the four databases (see S1 Appendix) yielded a total of 5,318 hits. Of these results, 250 records were identified through the ACM Digital Library, 2,549 records were found in IEEE Xplore, PubMed yielded 226 records, and Web of Science returned 2,293 records. However, 744 records were identified as duplicates and were, therefore, removed. This resulted in an EndNote Library with 4,574 unique records.
Subsequently, screening was performed in three consecutive phases to exclude irrelevant records, according to the process described in Sections 3.3 and 3.4.
In the first phase, 4,347 records were removed after screening the title, resulting in 227 remaining records. In the second phase, the records were screened based on their abstracts and keywords. The 191 records that were considered irrelevant were eliminated. This resulted in 36 remaining studies. In the third phase, the full texts of the records were screened. However, the full texts of two records could not be retrieved, and these studies were subsequently removed. Of these records, 11 records were considered not to be relevant and were excluded. This resulted in the identification of 23 eligible publications that were included in this systematic review. A detailed description of the characteristics of these studies is presented in S3 Appendix. Table 1 presents an extensive description of the studies that were included in this systematic review. All studies were published between 2010 and 2019. A majority of these studies (65.2%) were published in the last five years [6,30,44,45,53,58,74,76,[119][120][121][122][123][124][125]. Most studies were published in 2015 (17.4%) [119][120][121][122] and 2016 (17.4%) [58,[123][124][125], while no studies were published in 2012.
The results in Table 1 for the input sources, employed methods, and study effectiveness are discussed in the subsequent subsections.
There was a vast difference in the sample size that was included in the studies. This sample size ranged from 667 tweets [76] to 171,027,275 tweets [53]. Overall, in most studies, the sample size was either less than 25,000 (34.8%) [28,45,74,76,119,124,125,128] or one million or more (30.4%) [4,6,18,30,53,123,126]. In 26.1 percent of the studies, the sample size was

PLOS ONE
Surveillance of communicable diseases using social media: A systematic review  [24,58].
The time horizon with respect to the date of publication of the analyzed data was also diverse, ranging from one week [74,120] to 106 months [76]. However, most of the studies (39.1%) analyzed samples that were published in periods ranging from one to six months [4,6,18,24,28,44,49,122,124]. Additionally, more than one-fifth of the studies (21.7%) analyzed content that was published during a period of at least 25 months [30,53,76,125,128]. Only two studies (8.7%) included content that was published during a period less than one month [74,120]. However, two studies (8.7%) did not disclose the precise time horizon for the publication dates of the included samples [45,58].

Employed methods
In the forthcoming synthesis of the methods that publications employed, the reader should be aware that our objective was to investigate the methods that authors utilized and explicitly mentioned in their manuscript. We acknowledge the possibility that authors applied common methods for text analysis, such as stop word removal, tokenization, stemming, and lemmatization, but failed to report this in their manuscript. This may be explained by the fact that these preprocessing methods are highly common in natural language processing. Although all studies analyzed textual content using some variant of natural language processing, a majority of the studies (82.6%) failed to disclose information on the software that was used [4,18,24,28,30,45,49,53,58,74,76,[119][120][121][122][125][126][127][128] (see Table 1). Only four studies (17.4%) provided information about the utilized software [6,44,123,124]. When studies reported the software utilized, seven software packages were discussed. Stanford CoreNLP was used most often (7.4%) [44,124], while Apache Lucene's PorterStemFilter [124], Apache Lucene's StopFilter [124], Datasift service [123], Natural Language Toolkit [6], OpenNLP [124], and The Stanford parser [44] were used the least (3.7% each). Additionally, studies could utilize more than one software package. Byrd et al. [124] used four software packages for natural language processing.
A vast majority of the studies (87.0%) reported information on the algorithms that were used to predict the target, i.e., the outcome estimated using the textual content. A total of 18 algorithms were utilized. Support vector machines (24.4%) [4,24,30,44,45,49,53,58,125,128], linear regression (12.2%) [6,28,53,125,127], and Naïve Bayes (12.2%) [24,45,76,124,125] were used most often. These three supervised learning algorithms are highly popular among data mining practitioners. Therefore, their utilization was expected for the prediction of a numerical outcome or a category.
Although a vast majority of studies disclosed information on the algorithm used, 13.0 percent of the studies [18,121,122] did not provide such information.

Study effectiveness
All studies reported positive results on using user-generated textual content from social media to monitor or surveille communicable diseases (see Table 1). Although positive findings were reported, it was explicitly discussed in one study that lower educated males of older age are less likely to disclose information on the infectious disease dengue to Twitter, making this platform less suitable for the monitoring and surveillance of this disease among this group of persons [125].
Furthermore, the quality of the included studies was evaluated based on its reliability and validity. A majority of studies [4,24,28,30,45,49,53,58,74,76,120,121,[124][125][126][127] had medium reliability (69.6%) and validity (69.6%). Four studies [6,44,119,123] were found to have high reliability (17.4%) and validity (17.4%). This means that these studies not only provided a complete description of the methods that were used for the data collection and data analysis and that this process was considered repeatable, but these studies also reported results that are consistent with the research objective and the utilized research methods [114]. For the remaining studies [18,122,128], the reliability (13.0%) and validity (13.0%) were, however, low.

Analysis of publications by publication type
Furthermore, the included publications were additionally analyzed based on the publication type, i.e., conference proceedings and journal articles (see Table 2). Of these publications, eight studies (34.8%) were presented at a conference [4,18,24,76,121,124,127,128], and 15 studies (65.2%) were published in a peer-reviewed journal [6,28,30,44,45,49,53,58,74,119,120,122,123,125,126]. Overall, the analysis indicates that both conference proceedings and journal articles reported comparable findings.
However, there are a few notable and novel differences regarding the following themes: the type of communicable disease, social media platform, geographical locations of included samples, and the quality of these studies, which was operationalized as reliability and validity.
Although both conference proceedings and journal articles investigated content that was published in various countries, journal articles relatively more often included textual content that was published in the United States [28,44,49,53,58]. However, journal articles were also more likely to lack a disclosure of geographical information [74,120,123,126]. There were, however, no notable differences between the continents.
Last, only journal articles were evaluated as having high reliability and high validity [6,44,119,123]. No conference proceedings were assessed as high on these themes. In contrast, conference proceedings were more likely to have low reliability and low validity [18,128] relative to journal articles [122]. There were no notable differences between conference proceedings and journal articles that were assessed as having a medium quality. Overall, journal articles, therefore, had a higher quality than conference proceedings.

Analysis of publications by social media platform
In addition to the analyses above, the included publications were also analyzed based on the social media platforms from which the content was extracted (see Table 3). These social media platforms are Sina Weibo, Twitter, and Yahoo! Knowledge. Overall, comparable findings were reported across the groups of the literature that utilized content from each of the social media platforms.  Despite the overall comparability of the results, there are several notable and novel differences for the following themes: type of communicable disease and the quality of studies. The latter was operationalized as reliability and validity.
Second, the quality of the included studies, which was measured as reliability and validity, overall was higher for publications that utilized content from Twitter. More specifically, half of the studies that used Sina Weibo [122] and all studies that utilized Yahoo! Knowledge [128] had low reliability and low validity, and relatively more studies that used Twitter were medium quality [24,28,30,45,49,53,58,74,76,120,121,[124][125][126][127]. In addition, all studies that were evaluated to be high quality also analyzed content from Twitter [6,44,119,123].
Various techniques were used to process textual content. For example, sentiment analysis can be used to establish the subjectivity of content, such that news, which contains predominantly facts, can be distinguished from personal experiences that contain opinions. Because the included publications predominantly studied personal experiences and, therefore, excluded news, sentiment analysis or perhaps alternative strategies were used to classify this content.
Last, a discussion of our findings would not be complete without a reflection on Google Flu Trends. With the emergence of the internet, novel applications have been developed that  [4,120] k-Nearest Neighbors 1 (2.4) 0 (0.0) 0 (0.0) [4] Latent Dirichlet allocation (LDA) 0 (0.0) 1 (2.4) 0 (0.0) [44] Linear regression 0 (0.0) 5 (12.2) 0 (0.0) [6,28,53,125,127] Maximum entropy 0 (0.0) 1 (2.4) 0 (0.0) [125] Naïve Bayes 0 (0.0) 5 (12.2) 0 (0.0) [24,45,76,124,125] Random collect and analyze data for the purpose of public health surveillance [19]. To address some of the challenges of the traditional methods for public health surveillance, the software company Google built Google Flu Trends, which utilizes influenza-related search queries and search patterns from its users to estimate regional seasonal influenza outbreaks [6,131,132]. The underlying presumption of using search queries to predict influenza is that people, when they experience changes in their health status, search the internet for symptoms, treatments, and other medical advice for self-diagnosis [50]. The influenza-related search queries may then be analyzed for early indications of a seasonal influenza outbreak [19]. Therefore, increases or decreases in these search patterns may indicate the outbreak or the end of the seasonal flu season, respectively [19]. This made Google Flu Trends a novel real-time and global tool for remote sensing [123]. To enable researchers and public health authorities to perform their own analyses, Google also publishes these historical datasets online [58]. Some studies reported that Google Flu Trends achieves a higher accuracy for the prediction of seasonal influenza outbreaks than traditional methods [15]. For example, these search queries were used to predict seasonal influenza rates two weeks in advance at a 90 percent accuracy [127]. Similarly, influenza-related hospital visits were also analyzed using Google Flu Trends [133].
However, many researchers have reported that Google Flu Trends still faces many drawbacks related to its accuracy [58,124]. For example, Google Flu Trends was found to be inaccurate with respect to variations in seasonal influenza patterns that occur on an annual basis [134]. In addition, it did not predict the 2009 A(H1N1) pandemic and performed suboptimal in forecasting subsequent seasonal influenza seasons [134][135][136][137][138]. Predominantly, the reliability of Google Flu Trends has been seriously questioned since 2013, when it failed to predict the intensity of the seasonal influenza outbreak [139]. Others have also reported that that Google Flu Trends has suboptimal performance [140].
Although Google Flu Trends has remediated several limitations of traditional health surveillance methods, additional innovations that provide improvements are required to enable better public health surveillance [6]. Furthermore, due to the repeated failure to detect infectious disease outbreaks and the shortcomings described above, Google Flu Trends was discontinued in 2015 [54]. Therefore, there exists a need for alternative and more suitable surveillance methods [134,140], which we aimed to address using the present systematic review.

Limitations
This systematic review had six noteworthy limitations.
First, study selection, information extraction, quality assessment of publications, and analysis were performed by one researcher (PP). This may have introduced bias. However, the procedures and results were discussed by all authors, and disagreements were resolved by consensus.
Second, in the included publications, there was an unequal distribution of the analyzed communicable diseases. For example, studies most often reported on the effectiveness of using social media to monitor and surveille influenza, while fewer studies analyzed the effectiveness in relation to dengue and measles. Likewise, Ebola, HIV/AIDS, listeria, and tuberculosis received the least attention. Therefore, most studies reported findings on the same diseases, but it remains unknown to what extent these positive findings also hold for infectious diseases that were studied least often.
Third, Twitter was investigated in a vast majority of studies. However, Sina Weibo and Yahoo! Knowledge received very little attention. Additionally, other social media platforms exist, such as Facebook, which were not investigated at all. Therefore, it remains unknown whether content from other source media platforms can also be used effectively for the public health surveillance of communicable diseases, especially because these platforms may be targeted to different populations and, thus, may enable the monitoring of specific subgroups in this population.
Fourth, it is common and unavoidable that user-generated content published to social media is inherently noisy and biased. Most users are unqualified to assess their medical symptoms and may exaggerate mild or unrelated symptoms. Users may also be malicious and intentionally publish fake content and seek to discredit competition. We suggest the consideration of these factors when evaluating the effectiveness of the data sources and proposed tools.
Fifth, a majority of the publications structurally failed to report important information. For example, many publications did not explicitly disclose the language and geographical origin of the included content, although this could sometimes be implicitly inferred. This is particularly relevant because a vast majority of studies used Twitter, which does record the geographical location of its users. Similarly, the software, as well as specific methods and techniques used for natural language processing, were often omitted. In addition to a lack of information about the implementation in the included studies, the authors often failed to reflect on their collaboration with the authorities, such as public health institutes. All studies investigated how text can be processed and understood, and the reporting of such crucial information is, therefore, essential for replicability.
Sixth, this qualitative systematic review followed the PRISMA guidelines. As discussed in the methodology section, due to the interdisciplinary nature of the reviewed studies and their limitations, we acknowledge that it was not possible to complete every item from the PRISMA checklist (see S2 Appendix). For the same reason, no PROSPERO registration was made.

Theoretical recommendations
In the following, four recommendations are suggested to researchers.
First, although a vast number of the researchers included in this study investigated influenza, which clearly makes influenza a popular disease on this topic, and to a lesser extent dengue and measles have also been studied, it is essential that other communicable diseases also receive more attention in the literature. Indeed, many infectious diseases exist that pose a threat to public health, and it remains unknown whether these diseases can be monitored and predicted effectively using textual content. Therefore, we recommend that other infectious diseases be studied more frequently to produce more evidence on this topic.
Second, similarly, Twitter is clearly a popular social media platform for text mining. Some of its popularity is also related to the public accessibility of its content. However, many other popular platforms exist that have received far less or even no attention in the literature. It is, therefore, recommended that future research also account for those platforms. This is particularly relevant because only then can it be established whether certain platforms are more useful than others to surveille and predict infectious diseases, or perhaps these platforms may yield contradicting findings.
Third, a majority of studies failed to report critical information, such as the language and geographical origin of their content and the software, methods, and techniques used for natural language processing. Including such information is essential to establish the reliability and validity of findings and because it enables other researchers to replicate the study. It is, therefore, recommended that researchers disclose such information. In addition, it is highly recommended that the software that was developed to collect and analyze the data in studies is well documented and published for reuse by the community and that authors thoroughly describe the application of their NLP analysis.
Fourth, the included journal articles were overall of higher quality than conference proceedings. This difference may be partly explained by the peer review involved, which may be more elaborate for journals than for conferences. However, another explanation is related to the limited amount of important information that was disclosed about the included data, methodologies, and analyses. Therefore, it is highly recommended that researchers provide more of the information needed to establish the reliability and validity of their studies and the reported findings.

Conclusion
Our findings in this work indicate that text mining of health-related content published to social media can serve as a novel and powerful tool for the automated, real-time, and remote monitoring of public health and for the surveillance and prediction of communicable diseases in particular.
According to our results, practitioners at public health authorities may benefit from utilizing natural language processing applied to social media data for the surveillance of communicable diseases as a supplement to their traditional methods. Natural language processing provides an automated, real-time tool to analyze user-generated content that includes contextual information to surveille and predict communicable diseases worldwide. This systematic review indicates that textual content from social media can be an important source of this knowledge. Another benefit of social media content is that it enables remote sensing via the internet by collecting public information. There is, however, no need to replace traditional methods, such as the collection of information about diagnosed cases from medical practitioners. Nevertheless, practitioners are highly recommended to include textual content from social media as a supplementary source for their data in their public health surveillance efforts to monitor and predict communicable diseases.